AITopics | moe layer

Mixture of Experts (MoE) framework has become a popular architecture for large language models due to its superior performance compared to dense models. However, training MoEs from scratch in a large-scale regime is prohibitively expensive.

artificial intelligence, natural language, proceedings, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language (0.59)

Add feedback

48237d9f2dea8c74c2a72126cf63d933-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 17:30:55 GMT

arxiv preprint arxiv, machine learning, natural language, (14 more...)

Neural Information Processing Systems

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

Add feedback

2f00ecd787b432c1d36f3de9800728eb-Paper-Conference.pdf

Neural Information Processing SystemsApr-25-2026, 07:44:39 GMT

arxiv preprint arxiv, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

b8f10193cab43d45df9bb810637333fd-Paper-Conference.pdf

Neural Information Processing SystemsFeb-17-2026, 17:47:42 GMT

large language model, machine learning, sparsity, (20 more...)

Neural Information Processing Systems

Country:

North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Michigan (0.04)
(5 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Communications (0.98)
(2 more...)

Add feedback

dc192b3eeffebba21bd1d82f6752b84b-Paper-Conference.pdf

Neural Information Processing SystemsFeb-17-2026, 11:37:03 GMT

artificial intelligence, machine learning, natural language, (14 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

91edff07232fb1b55a505a9e9f6c0ff3-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-10-2026, 19:31:03 GMT

inequality, moe, nullw, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
Asia > Middle East > Jordan (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report > New Finding (0.45)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

Towards UnderstandingtheMixture-of-Experts LayerinDeepLearning

Neural Information Processing SystemsFeb-10-2026, 19:30:59 GMT

This motivates us to consider a challenging classification problem with intrinsic cluster structures.

artificial intelligence, arxivpreprintarxiv, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.05)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

A Training details

Neural Information Processing SystemsFeb-8-2026, 12:07:04 GMT

Models were trained with 32 experts, with experts placed every 2 layers - except where explicitly stated. The learned contrastive temperature parameter is initialised at 10. We train models at batch size 16,384 for 781,250 steps at resolution 224. These are B/16 models trained for 100,000 steps at batch size 8192. The default training data is mixed with data from JFT -4B with a ratio of 3:1.

artificial intelligence, machine learning, text token, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Filters

Collaborating Authors

moe layer

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

b8f10193cab43d45df9bb810637333fd-Paper-Conference.pdf

dc192b3eeffebba21bd1d82f6752b84b-Paper-Conference.pdf

BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts

48237d9f2dea8c74c2a72126cf63d933-Paper.pdf

2f00ecd787b432c1d36f3de9800728eb-Paper-Conference.pdf

b8f10193cab43d45df9bb810637333fd-Paper-Conference.pdf

dc192b3eeffebba21bd1d82f6752b84b-Paper-Conference.pdf

91edff07232fb1b55a505a9e9f6c0ff3-Supplemental-Conference.pdf

Towards UnderstandingtheMixture-of-Experts LayerinDeepLearning

A Training details